Before we even choose a model algorithm for our species distribution model (SDM), we need to have data to fit it to. For an SDM we need two types of data: species observation records (presence-only, presence-absence, or abundance) and environmental data (such as bioclimatic variables or vegetation maps, in the form of rasters). In some instances we might be using data we recorded ourselves, other times we might want to use remotely sensed data, or even a combination of the two. There are multiple sources of data available, and zoon gives you the ability to access many of them.
With zoon you are able to use your own data, data sourced from online repositories, and some modules in zoon that provide some occurrence and environmental data for you to use. This tutorial will guide you through the data sources available within a zoon workflow().
zoon Moduleszoon comes with several pre-existing dataset modules that we can use. Using the GetModuleList() command we can see all of the Occurrence modules under the $occurrence sub-heading, and this includes the dataset modules. We can also view the Covariate modules under the $covariate sub-heading.
modules <- GetModuleList()
modules$occurrence # subset for the sake of screen space
## [1] "CarolinaWrenPO" "CWBZimbabwe"
## [3] "LocalOccurrenceData" "Lorem_ipsum_UK"
## [5] "NaiveRandomPresence" "NaiveRandomPresenceAbsence"
## [7] "SugarMaple" "UKAnophelesPlumbeus"
## [9] "CarolinaWrenPA" "NATrees"
## [11] "AnophelesPlumbeus" "SpOcc"
modules$covariate
## [1] "Bioclim_future" "CarolinaWrenRasters" "LocalRaster"
## [4] "NaiveRandomRaster" "UKBioclim" "NCEP"
## [7] "UKAir" "AirNCEP" "Bioclim"
For example, we could choose to fit a model to the Carolina Wren data using the CarolinaWrenPO or CarolinaWrenPA occurrence modules (presence-only and presence-absence, respectively) with the CarolinaWrenRasters covariate module.
While these are perfectly usable datasets, they are just useful examples and not something we would fit an SDM to for the sake of the results. These are most useful for experimenting with zoon modules (want to explore the differences in model algorithms? Run them on these example dataset modules and compare the outputs), or as test datasets when building new modules of your own.
SDMs are commonly fit to datasets we have collected ourselves, and zoon has modules to help us load them. The two modules of interest here are LocalOccurrenceData for our observation records and LocalRaster for our raster-based data.
To ensure that all datasets loaded in to a zoon workflow are compatible with the model modules, the LocalOccurrenceData module requires our data to be a .csv/.xlsx/.tab/.xlsx file with a strict structure. The first and second columns are the longitude and latitude values (in that order), and the third column is the value of the observation (0 for absence, 1 for presence, and an integer for abundance data). If your coordinate system is not latitude/longitude then you can supply an optional fourth column called CRS that contains the proj4string for your coordinate system (e.g. “+init=epsg:27700” for easting/northing data). If no CRS column is supplied then latitude/longitude is assumed.
To use this module you call the occurrence module like this:
occurrence = LocalOccurrenceData(filename = "myData.csv", # File path to your data file
occurrenceType = "presence", # The type of data you have
columns = c(long = "longitude", # The names of the columns in
lat = "latitude", # your .csv that much the
value = "value"), # required columns
externalValidation = FALSE) # Only required if validation
# data is set up externally
Raster data loaded into a workflow using LocalRaster also follows a set format, but it is a simpler process than for occurrence data. This module reads in either a single raster or raster-stack, or a list or vector of rasters and creates a raster-stack.
To use this module you call the covariate module like this:
covariate = LocalRaster(rasters = c("myRaster1", # Filepath to a raster
"myRaster2")) # Filepath to a second raster
covariate = LocalRaster(rasters = "myRasterStack") # A RasterStack object already loaded
Sometimes we may need or want to source our occurrence and/or covariate data from online sources. The modules of interest here are SpOcc, Bioclim, Bioclim-future, and NCEP.
The SpOcc module is used to obtain species occurrence records from a selection of online data bases. The available databases are GBIF, BISON, iNat, eBird, Ecoengine, and AntWeb. We can call this module like this:
occurrence = SpOcc(species = "SpeciesName", # Species scientific name
extent = c(-1, 0, 51, 52), # Coordinates for the extent of the region
databases = "gbif", # List of data bases to use
type = "presence", # Type of data you want
limit = 10000) # A maximum limit of records to obtain
The Bioclim module obtains bioclimatic variables from WorldClim. This data is available at various resolutions (2.5, 5, or 10 minutes), and there are 19 available variables. We can call this module like this:
covariate = Bioclim(extent = c(-180, 180, -90, 90), # Coordinates for the extent of the region
resolution = 10, # Required resolution
layers = 1:5) # Variables we want (between 1-19)
We can also obtain these bioclimatic variables for predictions of the future using the Bioclim_future module. Anyone looking to obtain this data should first research Representative Concentration Pathways and General Circulation Models to make an informed decision about the predictions they are after. We can call this module like this:
covariate = Bioclim_future(extent = c(-10, 10, 45, 65), # Coordinates of the extent of the region
resolution = 10, # Resolution of the data
layers = 1:19, # Required Bioclim variables
rcp = 45, # Representative Concentration Pathways
model = "AC", # General Circulation Models
year = 70) # Time period for the prediction
The NCEP module obtains environmental data from the National Centers for Environmental Prediction. We can call this module like this:
covariate = NCEP(extent = c(-5, 5, 50, 60), # Coordinates of the extent of the region
variables = "hgt", # Character cevtor of variables of interest
status.bar = FALSE) # Show a status bar of download progress?
Now that we’ve seen the different ways of obtaining data in zoon, lets see an example. Here we obtain presence-only data for the Grizzly Bear, Ursus arctos in North America from GBIF, and bioclimatic variables from Wordclim.
Ursus_arctos_online <- workflow(occurrence = SpOcc(species = "Ursus arctos",
extent = c(-175, -65, 20, 75),
databases = "gbif",
type = "presence"),
covariate = Bioclim(extent = c(-175, -65, 20, 75),
resolution = 10,
layers = 1:19),
process = Chain(StandardiseCov, Background(1000)),
model = MaxEnt,
output = InteractiveOccurrenceMap)